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The joint likelihood is a simple extension of the standard likelihood formalism that enables the estimation of 
common parameters across disjoint datasets. Joining the likelihood, rather than the data itself, means nuisance 
parameters can be dealt with independently. Application of this technique, particularly to Fermi-LAT dwarf 
spheroidal analyses, has already been met with great success. We present a description of the method’s general 
implementation along with a toy Monte-Carlo study of its properties and limitations. 


1. Introduction 


Several recent studies Ackermann et al. 2011 


2Q12a|b 2Q14a|b by the LAT Collaboration 


success¬ 
fully apply the joint likelihood technique, combining 
constraints for searches ranging from galaxy cluster 
emission to effects of large extra dimensions. In the 
following, we introduce the technique from a more 
generic standpoint and compare/contrast it with other 
common methods of data combination. We proceed 
with the aid of a toy Monte-Carlo (MC) to demon¬ 
strate the method’s properties and explore some in¬ 
teresting behavior. 


2. Likelihood anaiysis, Joint iikeiihood, 
and basic data stacking 

2.1. Likeiihood 


where P is the probability of outcome, P, given the 
parameter a. Parameters are often separated into 
those of interest, p,, and nuisance, 0, in order to profile 
or marginalize the latter. 

We will focus on the specific form of £ to be a 
binned Poisson probability function so that 


yikp-Xk 

( 2 ) 

k 

where the symbols X{p^ 6) and n represent the pre¬ 
dicted and observed counts in a given bin, k. The pa¬ 
rameters (/x, 6) which yield the greatest value for £ are 
known as the maximum likelihood estimate (MLE). 

When testing a hypothesis, the MLE likelihood 
must be compared with that of the null hypothesis, 
i.e. a model lacking the effect(s) of interest, where 
p = Pq. Typically, we compare the logarithms of the 
two likelihoods with a measure called the Test Statis¬ 
tic: 


The likelihood incorporates information regarding 
both model and experiment into a function whose 
maximization provides an estimate of the true param¬ 
eter values. It can be expressed as 


C{a\V) = P{V\a), (1) 


TS = -2 In 


r C{fio,0\V) \ 
\C{fi,d\V) J 


(3) 


When its distribution is known, the TS can be mapped 
to a p-value associated with the alternative hypothe¬ 
sis. In most cases, it obeys the asymptotic theorem 
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Figure 1: A typical delta log-likelihood prohle for a 
single-parameter source with no signal. 


[Chernoff 1954 and follows a x^/2 with degrees of 
freedom equal to the number of free signal parame¬ 
ters (assuming signal is constrained to be positive). 
There are scenarios (som e are mentioned in Section 
1^ and [Ackermann et ah 2Q14a| ) where this does not 
hold, and one must derive the TS behavior from a set 
of control data, e.g. with Monte-Carlo. 

Once the distributions are known, confidence inter¬ 
vals can be set for parameters by exploring the log- 
likelihood space surrounding the MLE. Figure illus¬ 
trates a typical delta log-likelihood profile for a sim¬ 
ple system where there is only one free parameter that 
controls the strength of the new phenomenon. Within 
the asymptotic regime, confidence limits would be set 
at levels corresponding to the probability density 
function, e.g. a difference of 2.71 from the maximum 
indicates 90% one-sided coverage. 


2.2. Joint Likelihood 


To make use of a joint likelihood, one presumably 
has N datasets which share some signal parameter(s), 
fi. The procedure for joining is analogous to the way 
binned probabilities make up £ — simply take the 
product of each set’s likelihood Conrad 2015 . Ex¬ 
plicitly, 


N 

£joint = \lc{^l^,ed\Vd) 


d=l 


( 4 ) 


This construction is clean in the sense that the data 
sets remain disjoint. Each could have different back¬ 
grounds, exposures, or have even come from different 
instruments. All these characteristics (nuisance pa¬ 
rameters) are accounted for in the individual likeli¬ 
hoods. 

2.3. Data stacking 

Alternative methods for combining data exist, the 
most basic of which being to evaluate the likelihood 
of the data set union. That is, instead of 


N 

X{c{^la,ed\Vd), (5) 

d=l 

we evaluate the stacked data likelihood:^ 


C{pi,e\yjv). ( 6 ) 

Here, data sets are lumped together and then the 
hypothesis test is performed with respect to a model 
which is also the sum of individual expectations. 
Switching to Pearson’s x^ and keeping the notation 
from the previous section, a stacked test statistic 
might look like this: 


XstackCM? ^) 


^d,k{F'^ ^)] 
Sd,/c ^d,k{F'^ 0) 


As before, parameters are adjusted to optimize (in 
this case minimize) the x^. Significance and confi¬ 
dence intervals are directly interpreted according to 
the expected probability density function. 

Although easily done, it is not difficult to envision 
problems with such a strategy. Data sets with weak 
signal-to-noise wash out when combined with those 
which are larger, though not necessarily more con¬ 
straining. This method throws away information and 
is therefore not optimal. 

One can do better by combining residuals, i.e. 




E 




( 8 ) 


This is a much more viable alternative to the joint 
likelihood method. Depending on the situation, how¬ 
ever, its implementation can be tricky. Eor example, 
suppose that the predicted number of events also de¬ 
pends on some nuisance parameter, e (e.g. time or 
exposure). Uncertainties on this parameter can be 
accounted for by adding an additional term to X^esid 
if they can be modeled as Gaussian. If not, there is no 
obvious way to include them in the data stacking ap¬ 
proach, whereas modifying the likelihood is straight¬ 
forward for any known model of the nuisance param¬ 
eter uncertainty. 


3. Properties 
3.1. Toy Model 

To illustrate the fundamental properties of the 
method, we employ a simple toy MC model for com- 


^A common example would be the addition of counts maps. 
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Figure 2: Confidence intervals on the shared signal 
parameter, p, derived for two single-bin data sets with 
signal-to-background ratios of 1:1 and 1:10. For an 
increasing number of total counts, 1000 MC realizations 
determine the median intervals for each method of 
combination. Bands represent the 68% containment 
among realizations. 


billing constraints: single-bin data sets with Poisson 
counts generated according to 


= p'SdFbd (9) 

M = {l^^Sd} 

0 = {bd} 

Each set may have a different number of total events 
and has a background determined by the nuisance pa¬ 
rameter, b. Signal is determined from an individual, 
Sd^ and common scale factor parameter, p. The latter 
is the value we wish to estimate. 


3.2. Confidence Interval, Coverage, and 
Power. 

As a starting point, we investigate the combination 
of two sets with signal-to-background ratios of 1:1 and 
1:10. Fig. [^illustrates how the confidence intervals 
behave as a function of total events for the Xgtack’ 
X?esid’ >Cjoint formulations. In all scenarios, the 
coverage adheres to the nominal value and the lim¬ 
its improve in approximate proportion to the square 
root of the set size. As expected, we see that the 
formed from residuals out-performs the simple stack 
(by yielding a tighter interval), and matches the joint 
likelihood. 

The TS distribution of the two-set joint likelihood 
is depicted in Fig. Note that the distribution is 
halved (with the remaining stacked at zero TS) when 
the signal parameter is constrained to be greater than 
zero. 


3.3. Effects of Additional Data Sets 

Increasing the number of data sets comprising the 
joint likelihood naturally improves the power and 


Figure 3: TS distribution of two-set joint likelihood for 
both unconstrained and positive-only signal fits, along 
with the corresponding expected asymptotic 
distributions. 



Figure 4: Behavior of toy-model upper limits with the 
addition of sets. Signal-to-background is 1:10 with 100 
total events. 


tightens the limits, albeit at a rate dependent on their 
signal-to-noise ratios. As long as the model uncertain¬ 
ties remain consistent, sets can be added indefinitely 
with no ill effect on the sensitivity. As an example, 
see Fig. where 95% confidence upper limits are cal¬ 
culated with a cumulative number of toy-model sets. 
Each set is identical, with signal-to-noise equal to 1:10 
with 100 total events. In this regime, limits improve 
with the square root of the number of sets, N. 

In certain situations the joint upper limits can im¬ 
prove even more rapidly. Any time ln[>Cjoint] p holds 
throughout the allowed range of p, the constraints 
scale in direct proportion to N. Eor example, a very 
low background might give a Poissonian likelihood, 
resulting in linear log-space behavior. Eorming the 
joint likelihood in log-space consists of adding these 
profiles together. Eor the case of a set of linear func¬ 
tions, the limit level is then proportional to the sum 
of the slopes, i.e. 


/iUL 



-1 


11=0 


( 10 ) 


The sum can be reduced to N in the case of set of 
profiles with identical slopes. See Eig. where a low 
background induces an appreciable effect. 
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Figure 5: Behavior of upper limits with the addition of 
data sets, with very low background [{s,b) = (1,0.1)] and 
the constraint that s > 0. 



5. Discussion and Conclusions 


The technique of joint likelihood, already widely 
used among Fermi-LAT Collaboration analyses, pro¬ 
vides a straightforward and universal tool for combin¬ 
ing constraints from astrophysical targets and other 
disjoint data sets. We demonstrate that it matches 
the performance of residual stacking, and note that 
it often requires less effort to implement. We model 
and describe the method’s behavior in two interest¬ 
ing regimes: first for very low background and second 
for the case of overlapping data sets. The possible 
applications of the technique have by no means been 
exhausted and we encourage its continued use. Lastly, 
we plan to expand on studies of the method’s behavior 
in an upcoming publication. 


Figure 6: The inflation of TS as two equivalent data sets _ , , , 

gradually overlap Acknowledgments 


4. Caveats 


4.1. Overlapping Data Sets 

It is best to avoid overlap between data sets. If 
they do, then where there is signal, the TS will be 
erroneously increased by double-counting (Figure]^, 
approximately in direct proportio n to the percentage 
of over lap [See also appendix of Ackermann et al] 
2014b]. When constructing a TS distribution, the 


significance derived from low-probability fluctuations 
will be similarly inflated. See Figure where this is 
demonstrated using the preceding toy model. The up¬ 
ward skew there indicates that type II errors are more 
common than usual, effectively lowering the sensitiv¬ 
ity of the study. 



Figure 7: Effect on the null distribution from a 50% 
overlap correlation between a two-set joint likelihood. 


The authors wish to thank JACoW for their guid¬ 
ance in preparing this template. 

Work supported by Department of Energy contract 
DE-AC03-76SF00515. 


References 


M. Ackermann et al. (Fermi-LAT Collaboration), 
Phys.Rev. L107, 241302 (2011), 1108.3546. 

M. Ackermann et al. (Fermi-LAT Collaboration), Sci¬ 
ence 338, 1190 (2012a), 1211.1671. 

M. Ackermann et al. (Fermi-LAT Collaboration), 
JCAP 2 , 012 (2012b), 1201.2460. 

M. Ackermann et al. (Fermi-LAT Collaboration), 
Phys.Rev. D89, 042001 (2014a), 1310.0828. 

M. Ackermann et al. (Fermi-LAT Collaboration), ApJ 
787, 18 (2014b), 1308.5654. 

H. Chernoff, Ann. Math. Statist. 25, 573 (1954), URL 
http://www.j stor.org/stable/2236839, 

J. Conrad, Astroparticle Physics 62, 165 (2015), 
1407.6617. 


eConf C141020.1 



























